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Abstract 

A wide variety of problems in machine learning, including exemplar clustering, document 
summarization, and sensor placement, can be cast as constrained submodular maximization 
problems. Unfortunately, the resulting submodular optimization problems are often too large to 
be solved on a single machine. We develop a simple distributed algorithm that is embarrassingly 
parallel and it achieves provable, constant factor, worst-case approximation guarantees. In our 
experiments, we demonstrate its efficiency in large problems with different kinds of constraints 
with objective values always close to what is achievable in the centralized setting. 


1 Introduction 

A set function / : 2^ —)■ M>o on a ground set V is submodular if f{A) + f{B) > f {AnB) + f {AuB) 
for any two sets A,BCV. Several problems of interest can be modeled as maximizing a submodular 
objective function subject to certain constraints: 

max/(A) subject to A G C, 

where C C 2^^ is the family of feasible solutions. Indeed, the general meta-problem of optimiz¬ 
ing a constrained submodular function captures a wide variety of problems in machine learning 
applications, including exemplar clustering, document summarization, sensor placement, image 
segmentation, maximum entropy sampling, and feature selection problems. 

‘The authors are listed alphabetically. 

^Work supported by EPSRC grant EP/J021814/1. 
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At the same time, in many of these applications, the amount of data that is collected is quite large 
and it is growing at a very fast pace. For example, the wide deployment of sensors has led to the 
collection of large amounts of measurements of the physical world. Similarly, medical data and 
human activity data are being captured and stored at an ever increasing rate and level of detail. 
This data is often high-dimensional and complex, and it needs to be stored and processed in a 
distributed fashion. 

In these settings, it is apparent that the classical algorithmic approaches are no longer suitable 
and new algorithmic insights are needed in order to cope with these challenges. The algorithmic 
challenges stem from the following competing demands imposed by huge datasets: the computations 
need to process the data that is distributed across several machines using a minimal amount of 
communication and synchronization across the machines, and at the same time deliver solutions 
that are competitive with the centralized solution on the entire dataset. 

The main question driving the current work is whether these competing goals can be reconciled. 
More precisely, can we deliver very good approximate solutions with minimal communication over¬ 
head? Perhaps surprisingly, the answer is yes; there is a very simple distributed greedy algorithm 
that is embarrassingly parallel and it achieves provable, constant factor, worst-case approximation 
guarantees. Our algorithm can be easily implemented in a parallel model of computation such as 
MapReduce [2]. 


1.1 Background and Related Work 


In the MapReduce model, there are m independent machines. Each of the machines has a limited 
amount of memory available. In our setting, we assume that the data is much larger than any 
single machine’s memory and so must be distributed across all of the machines. At a high level, a 
MapReduce computation proceeds in several rounds. In a given round, the data is shuffled among 
the machines. After the data is distributed, each of the machines performs some computation on 
the data that is available to it. The output of these computations is either returned as the final 
result or becomes the input to the next MapReduce round. We emphasize that the machines can 
only communicate and exchange data during the shuffle phase. 

In order to put our contributions in context, we briefly discuss two distributed greedy algorithms 
that achieve complementary trade-offs in terms of approximation guarantees and communication 
overhead. 


Mirzasoleiman et al. HD] give a distributed algorithm, called GreeDi, for maximizing a monotone 
submodular function subject to a cardinality constraint. The GreeDi algorithm partitions the 
data arbitrarily on the machines and on each machine it runs the classical Greedy algorithm to 
select a feasible subset of the items on that machine. The Greedy solutions on these machines are 
then placed on a single machine and the Greedy algorithm is used once more to select the final 
solution. The GreeDi algorithm is very simple and embarrassingly parallel, but its worst-case 
approximation guarantee^ is 1/0 ^min , where m is the number of machines and k is 


^Mirzasoleiman et al. HD] give a family of instances where the approximation achieved is only 1 / min {k, m} if the 
solution picked on each of the machines is the optimal solution for the set of items on the machine. These instances 
are not hard for the GreeDi algorithm. We show in Sections and that the GreeDi algorithm achieves an 
1/0 (^min I y/k, approximation. 
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the cardinality constraint. Despite this, Mirzasoleiman et al. show that the GreeDi algorithm 
achieves very good approximations for datasets with geometric structure. 

Kumar et al. [S] give distributed algorithms for maximizing a monotone submodular function 
subject to a cardinality or more generally, a matroid constraint. Their algorithm combines the 
Threshold Greedy algorithm of |Tj with a sample and prune strategy. In each round, the algorithm 
samples a small subset of the elements that fit on a single machine and runs the Threshold Greedy 
algorithm on the sample in order to obtain a feasible solution. This solution is then used to prune 
some of the elements in the dataset and reduce the size of the ground set. The Sample&Prune 
algorithms achieve constant factor approximation guarantees but they incur a higher communi¬ 
cation overhead. For a cardinality constraint, the number of rounds is a constant but for more 
general constraints such as a matroid constraint, the number of rounds is 0(log A), where A is the 
maximum increase in the objective due to a single element. The maximum increase A can be much 
larger than even the number of elements in the entire dataset, which makes the approach infeasible 
for massive datasets. 

On the negative side, Indyk et al. [S] studied coreset approaches to develop distributed algorithms 
for finding representative and yet diverse subsets in large collections. While succeeding in several 
measures, they also showed that their approach provably does not work for fc-coverage, which is a 
special case of submodular maximization with a cardinality constraint. 


1.2 Our Contribution 

In this paper, we show that we can achieve both the communication efficiency of the GreeDi 
algorithm and a provable, constant factor, approximation guarantee. Our algorithm is in fact the 
GreeDi algorithm with a very simple and crucial modification: instead of partitioning the data 
arbitrarily on the machines, we randomly partition the dataset. Our analysis may perhaps provide 
some theoretical justification for the very good empirical performance of the GreeDi algorithm 
that was established previously in the extensive experiments of m- It also suggests the approach 
can deliver good performance in much wider settings than originally envisioned. 

The GreeDi algorithm was originally studied in the special case of monotone submodular max¬ 
imization under a cardinality constraint. In contrast, our analysis holds for any hereditary con¬ 
straint. Specifically, we show that our randomized variant of the GreeDi algorithm achieves a 
constant factor approximation for any hereditary, constrained problem for which the classical (cen¬ 
tralized) Greedy algorithm achieves a constant factor approximation. This is the case not only 
for cardinality constraints, but also for matroid constraints, knapsack constraints, and p-system 
constraints [B], which generalize the intersection of p matroid constraints. Tablegives the approx¬ 
imation ratio a obtained by the greedy algorithm on a variety of problems, and the corresponding 
constant factor obtained by our randomized GreeDi algorithm. 

Additionally, we show that if the greedy algorithm satisfies a slightly stronger technical condition, 
then our approach gives a constant factor approximation for constrained non-monotone submodular 
maximization. This is indeed the case for all of the aforementioned specific classes of problems. 
The resulting approximation ratios for non-monotone maximization problems are given in the last 
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Constraint 

a 

monotone approx. (^) 

non-monotone approx, f ) 

cardinality 

1 - ^ Ri 0.632 

e 

« 0.316 

« 0.12 

matroid 

1 

2 

1 

4 

1 

10 

knapsack 

Ri 0.35 

Ri 0.17 

« 0.074 

p-system 

1 

1 

1 

p-i-i 

2(p-fl) 

2-t4(p+l) 


Table 1: New approximation results for randomized GreeDi for constrained monotone and non¬ 
monotone submodular maximizatior|3 

column of Table [U 


1.3 Preliminaries 

MapReduce Model. In a MapReduce computation, the data is represented as (key, value) pairs 
and it is distributed across m machines. The computation proceeds in rounds. In a given, the data 
is processed in parallel on each of the machines by map tasks that output (key, value) pairs. These 
pairs are then shuffled by reduce tasks] each reduce task processes all the (key, value) pairs with 
a given key. The output of the reduce tasks either becomes the final output of the MapReduce 
computation or it serves as the input of the next MapReduce round. 

Submodularity. As noted in the introduction, a set function / : 2^ —>■ ]R>o is submodular if, for 
all sets A, B CV, 

f{A) + f{B)>f{AuB) + f{AnB). 

A useful alternative characterization of submodularity can be formulated in terms of diminishing 
marginal gains. Specifically, / is submodular if and only if: 

f{A U {e}) - f{A) > f{B U {e}) - f{B) 

for all A C R C R and e ^ B. 

The Lovdsz extension f~ : [0,1]^ —>■ M>o of a submodular function / is given by: 

For any submodular function /, the Lovasz extension f~ satisfies the following properties: (1) 
f~{ls) = f{S) for all S C V, (2) f~ is convex, and (3) /“(c • x) > c • /“(x) for any c G [0,1]. 
These three properties immediately give the following simple lemma: 

Lemma 1. Let S be a random set, and suppose that E[l 5 ] = c • p (for c G [0, 1]). Then, E[/(5)] > 
c- /"(p). 

^The best-known values of a are taken from |11| (cardinality), [S] (matroid and p-system), and |ld| (knapsack). 
In the case of a knapsack constraint, Wolsey in fact employs a slightly modified variant of the greedy algorithm. We 
note that the modified algorithm still satisfies all technical conditions required for our analysis (in particular, those 
for Lemma 1^. 
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Algorithm 1 The standard greedy algorithm Greedy 

loop 

Let C = {e E y \ 5 : 5 U {e} G 1} 

Let e = argmaXe6c{/(5' U {e}) - f{S)} 
if C = 0 or f{S U {e}) — f{S) < 0 then 
return S 
end if 
end loop 


Proof. We have: 


E[f{s)] = E[/-(i5)] > rms]) = r (c • p) > c • /-(p), 

where the first equality follows from property (1), the first inequality from property (2), and the 
final inequality from property (3). □ 

Hereditary Constraints. Our results hold quite generally for any problem which can be formu¬ 
lated in terms of a hereditary constraint. Formally, we consider the problem 

max{f{S) : S <ZV,S el}, (1) 

where f : 2^ ^ ]R>o is a submodular function and X C 2^ is a family of feasible subsets of V. 
We require that X be hereditary in the sense that if some set is in X, then so are all of its subsets. 
Examples of common hereditary families include cardinality constraints (X = {AC1/ : |A| < k}), 
matroid constraints (X corresponds to the collection independent sets of the matroid), knapsack 
constraints (X = {A C V : well as arbitrary combinations of such constraints. 

Given some constraint X C 2^, we shall also consider restricted instances in which we are presented 
only with a subset V IV , and must find a set S VV with S el that maximizes /. We say that 
an algorithm is an a-approximation for maximizing a submodular function subject to a hereditary 
constraint X if, for any submodular function / : 2^ —)• M>o and any subset V' VV the algorithm 
produces a solution S IV' with S e 1, satisfying f{S) > a ■ /(OPT), where OPT E X is any 
feasible subset of V. 


2 The Standard Greedy Algorithm 

Before describing our general algorithm, let us recall the standard greedy algorithm. Greedy, 
shown in AlgorithmThe algorithm takes as input (P,X, /), where P is a set of elements, X C 2^ 
is a hereditary constraint, represented as a membership oracle for X, and / : 2^^ —?■ ]R>o is a non¬ 
negative submodular function, represented as a value oracle. Given (P, X,/), Greedy iteratively 
constructs a solution S' E X by choosing at each step the element maximizing the marginal increase 
of /. For some A C P, we let Greedy(A) denote the set S el produced by the greedy algorithm 
that considers only elements from A. 

The greedy algorithm satisfies the following property: 
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Algorithm 2 The distributed algorithm RandGreeDi 

for e e R do 

Assign e to a machine i chosen uniformly at random 

end for 

Let Vi be the elements assigned to machine i 

Run GREEDY(Vi) on each machine i to obtain Si 

Place 5" = Ui 'S'i on machine 1 

Run Alg(S') on machine 1 to obtain T 

Let S' = argmaxj{/(S'j)} 

return argmax{/(r), f{S')} 


Lemma 2. Let A C R and B V be two disjoint subsets of V. Suppose that, for eaeh element 
e G B, we have Greedy(A U {e}) = Greedy(A). Then Greedy(A U B) = Greedy(A). 

Proof. Suppose for contradiction that Greedy(A U R) / Greedy(A). We first note that, if 
Greedy(AuR) C a, then Greedy(AuR) = Greedy(A); this follows from the fact that each it¬ 
eration of the Greedy algorithm chooses the element with the highest marginal value whose addition 
to the current solution maintains feasibility for I. Therefore, if Greedy(AU B) ^ Greedy(A), 
the former solution contains an element of B. Let e be the first element of B that is selected by 
Greedy on the input AU B. Then Greedy will also select e on the input A U {e}, which contradicts 
the fact that Greedy(A U {e}) = Greedy(A). □ 

3 A Randomized, Distributed Greedy Algorithm for Monotone 
Submodular Maximization 

Algorithm. We now describe our general, randomized distributed algorithm, RandGreeDi, 
shown in Algorithm Suppose we have m machines. Our algorithm runs in two rounds. In the 
first round, we randomly distribute the elements of the ground set R to the machines, assigning 
each element to a machine chosen independently and uniformly at random. On each machine z, we 
execute Greedy(R) to select a feasible subset Si of the elements on that machine. In the second 
round, we place all of these selected subsets on a single machine, and run some algorithm Alg on 
this machine in order to select a final solution T. We return whichever is better: the final solution 
T or the best solution amongst all the Si from the first phase. 

Analysis. We devote the rest of this section to the analysis of the RandGreeDi algorithm. Fix 
(R,X, /), where X C 2^^ is a hereditary constraint, and / : 2^^ —)■ M>o is any non-negative, monotone 
submodular function. Suppose that Greedy is an a-approximation and Alg is a /3-approximation 
for the associated constrained monotone submodular maximization problem of the form Q- Let 
n = |R| and suppose that OPT = argmax^^gj/(A) is a feasible set maximizing /. 

Let V(l/m) denote the distribution over random subsets of R where each element is included 
independently with probability 1/m. Let p £ [O,!]”" be the following vector. For each element 
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e G y, we have 


Pe = 


Pr [e G Greedy(^ U |e|)l 

0 


if e G OPT 
otherwise 


Our main theorem follows from the next two lemmas, which characterize the quality of the best 
solution from the first round and that of the solution from the second round, respectively. Recall 
that f~ is the Lovasz extension of /. 


Lemma 3. For each machine i, E[/(S'j)] > a ■ f (Iqpt — p) • 


Proof. Consider machine i. Let Vi denote the set of elements assigned to machine i in the first 
round. Let Oj = {e G OPT: e ^ GREEDY(Vi U {e})}. We make the following key observations. 

We apply Lemmaj^with A = Vi and B = Oi\Vi to obtain that Greedy(I^) = GREEDY(ViUOj) = 
Si- Since OPT G X and X is hereditary, we must have Oi G X as well. Since Greedy is an 
a-approximation, it follows that 

f{Si)>a-f{Oi). 

Since the distribution of Vi is the same as V(l/m), for each element e G OPT, we have 

Pr[e G Oi] = 1 - Pr[e ^ Oi] = I - pe 
= Iqpt — p- 

By combining these observations with Lemma we obtain 

E[/(S'i)] > a ■ E[f{Oi)] >a-f~ (Iqpt - p) • 


□ 


Lemma 4. E[/(Alg(5))] > /3 ■ f (p). 

Proof. Recall that 5 = IJi GREEDY(Vi). Since OPT G X and X is hereditary, S n OPT G X. Since 
Alg is a /3-approximation, we have 

/(Alg( 5)) >^-/(5nOPT). (2) 

Consider an element e G OPT. For each machine i, we have 

Pr[e G S' I e is assigned to machine i] = Pr[e G Greedy(14) | e G V/j 

= Pr [e G Greedy(A) | e G A] 

A'^V{l/m) 

= Pr [e G Greedy(R U {e})] 

S~V(l/m) 

= Pe. 

The first equality follows from the fact that e is included in S if and only if it is included in 
Greedy(V/). The second equality follows from the fact that the distribution of Vi is identical to 
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V(l/m). The third equality follows from the fact that the distribution of ^ ~ V(l/m) conditioned 
on e £ ^ is identical to the distribution of B L) {e} where B ~ V(l/m). Therefore 


Pr[e G 5nOPT] = pe 

IE[l5nOPT] = P- (3) 

By combining Q, Q, and Lemma 0 we obtain 

E[/(Alg(5))] >/3-E[/(5nOPT)] >/3-/-(p). 


□ 


Combining Lemma and Lemma gives us our main theorem. 

Theorem 5. Suppose that Greedy is an a-approximation algorithm and Alg is a fi-approximation 
algorithm for maximizing a monotone submodular funetion subjeet to a hereditary eonstraint X. 
Then RandGreeDi is (in expeetation) an ^^-approximation algorithm for the same problem. 

Proof. Let Si = GREEDY(Vi), S = [j^ Si be the set of elements on the last machine, and T = Alg(5') 
be the solution produced on the last machine. Then, the output D produced by RandGreeDi 
satisfies f{D) > maxj(/(S'j)) and f{D) > f{T). Thus, from Lemmasandwe have: 

E[/(D)] >a-/-(loPT-p) (4) 

E[/(D)] >/3./-(p). (5) 

By combining Q and (|^, we obtain 

{/3 + a) E[/(D)] > afd{f~{p) + /“(Iqpt - p)) 

> a/3 ■ /“(Iqpt) 

= a/3 • /(OPT). 

In the second inequality, we have used the fact that f~ is convex and /“(c-x) > c/“(x) for any 
constant c £ [0,1]. □ 

If we use the standard greedy algorithm for Alg, we obtain the following simplified corollary of 
Theorem [H 

Corollary 6. Suppose that Greedy is an a-approximation algorithm for maximizing a monotone 
submodular funetion, and use Greedy as the algorithm Alg in RandGreeDi. Then, the resulting 
algorithm is (in expeetation) an ^-approximation algorithm for the same problem. 

4 Non-Monotone Submodular Functions 

We consider the problem of maximizing a non-monotone submodular function subject to a heredi¬ 
tary constraint. Our approach is a slight modification of the randomized, distributed greedy algo¬ 
rithm described in Section and it builds on the work of |1] . Again, we show how to combine the 


standard Greedy algorithm, together with any algorithm Alg for the non-monotone case in order 
to obtain a randomized, distributed algorithm for the non-monotone submodular maximization. 

Algorithm. Our modified algorithm, NMRandGreeDi, works as follows. As in the monotone 
case, in the first round we distribute the elements of V uniformly at random amongst the m 
machines. Then, we run the standard greedy algorithm twice to obtain two disjoint solutions 
and Sf on each machine. Specifically, each machine first runs Greedy on Vj to obtain a solution 
Sl, then runs Greedy on Vi \ to obtain a disjoint solution Sf. In the second round, both of 
these solutions are sent to a single machine, which runs Alg on S' = U Sf) to produce a 

solution T. The best solution amongst T and all of the solutions Sf and Sf is then returned. 

Analysis. We devote the rest of this section to the analysis of the algorithm. In the following, we 
assume that we are working with an instance {V,Z,f) of non-negative, non-monotone submodular 
maximization for which the Greedy algorithm has the following property: 


For all S £ I: /(Greedy(F)) > a ■ /(Greedy(1/) U S) (GP) 


The standard analysis of the Greedy algorithm shows that (GP) is satisfied with constant a for 
hereditary constraints such as matroids, knapsacks, and p-systems (see Table [^. 


The analysis is similar to the approach from the previous section. We define V(l/m) as before. We 
modify the definition of the vector p as follows. For each element e £ R, we have 


Pr 

A~V(l/m) 


e G Greedy(A U {e}) or 


Pe= \ 


e G Greedy((A U {e}) \ Greedy(A U {e})) 


if e G OPT 


0 


otherwise 


We now derive analogues of Lemmas and 


Lemma 7. Suppose that Greedy satisfies (|GP|). For each machine i, 

E 


/(5/) + /(Sf) >a-r(loPT-p), 


and therefore 


E [max|/(5/),/(S'f)} 


a 


- 2 '^ (IqPT - p) 


Proof. Consider machine i and let Vi be the set of elements assigned to machine i in the first round. 
Let 


Oi = {e £ OPT: e ^ GREEDY(l/i U {e}) and 

e ^ GREEDY((l/j U {e}) \ GREEDY(l/j U {e}))} 

Note that, since OPT G X and I is hereditary, we have Oi £ X. 

It follows from Lemma [2] that 

S} = GREEDY(yi) = GREEDY(l/j U Oi), (6) 

Sf = GREEDY(yi \ Sj) = GREEDY((Fi \ Sj) U Oi). (7) 
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By combining the equations above with the greedy property (GPI, we obtain 


f{Sl)^ f{GREEBY{ViUOi)) 


^ a- /(GREEDY(Pi U Oi) U Oi) 

ia-/(S/uO0, 

f{Sf) i /(Greedy((P, \ Sj) U OO) 


(8) 


a- /(GREEDY((Pi \ S/) U Oi) U Oi) 
^ a- fiS^UOi). 


(9) 


Now we observe that 

f{S} U Oi) + f{Sf U Oi) > f{{Sj U Oi) n {Sf U Oi)) + f{S} usfu Oi) (/ is submodular) 

= fm + f{sl u sf u Oi) {SI n 52 = 0) 

> f{Oi). (/ is non-negative) (10) 

By combining ([^, ([^, and (10), we obtain 

f{Sl) + f{Sf)>a-f{Oi). 


Since the distribution of V) is the same as V(l/m), for each element e G OPT, we have 

Pr[e G Oj] = 1 - Pr[e ^ Oj] = 1 - Pe, 

= loPT — P- 

By combining (11), (12), and Lemma we obtain 

E[f{Sl) + f{Sf)] > a- E[/(O0] (By ^) 

> a- /“(loPT — p)- (By (12) and Lemma 


( 11 ) 


( 12 ) 


□ 


Lemma 8. E[/(Alg(5))] > /3 • / (p). 

Proof. Recall that Sj = GREEDY(Vi), Sf = GREEDY(Pj\5l), and S = lJj(5luS'2). Since OPT G I 
and Z is hereditary, S O OPT G Z. Since Alg is a /3-approximation, we have 

/(Alg( 5)) >/3-/(5nOPT). (13) 

Gonsider an element e G OPT. For each machine i, we have 

Pr[e G S I e is assigned to machine i] 

= Pr[e G Greedy(V)) or e G GREEDY(Pi \ GREEDY(l/j)) | e G V)] 

= Pr [e G Greedy(A) or e G Greedy(A \ Greedy(A)) | e G A] 

A~V(l/m) 

= Pr [e G Greedy(R U {e}) or e G Greedy((R U {e}) \ Greedy(R U {e}))] 

= Pe- 
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The first equality above follows from the fact that e is included in S' iff e is included in either Sj 
or Sf. The second equality follows from the fact that the distribution of Vi is the same as V(l/m). 
The third equality follows from the fact that the distribution of ^ ~ V(l/m) conditioned on e £ ^ 
is identical to the distribution of -B U {e} where B ~ V(l/m). Therefore 


Pr[e G SnOPT] = p^, 
IE[ls'nOPT] = P- 


By combining (131, (14), and Lemmawe obtain 


E[/(Alg( 5))] >/3-E[/(BnOPT)] >/3./-(p). 


(14) 


□ 


We can now combine Lemmas H] and 0 to obtain our main result for non-monotone submodular 
maximization. 


Theorem 9. Consider the problem of maximizing a submodular funetion under some hereditary 
eonstraintZ, and suppose that Greedy satisfies (GP| and Alg is a j3-approximation algorithm 
for this problem. Then NMRandGreeDi is (in expeetation) an -approximation algorithm for 
the same problem. 


Proof. Let Sf = GREEDY(Vi), Sf = GREEDY(l/j \ S'/), and S = Ui('S'/ U S/) be the set of elements 
on the last machine, and T = Alg(S') be the solution produced on the last machine. Then, the 
output D produced by RandGreeDi satisfies f{D) > maxj max{/(5/), /(S'/)} and f{D) > f{T). 
Thus, from Lemmas 0 and we have: 


E[/(B)] > "-/-(loPT-p), 
E[/(B)] >/3./-(p). 


By combining (15) and (16), we obtain 


{2l3-^a)E[f{D)]>afi[f (p)-h / (Iqpt - p)] 
> afi ■ /“(Iqpt) 

= 0/3-/(OPT). 


(15) 

(16) 


In the second inequality, we have used the fact that / is convex and / (c- x) > cf (x) for any 
constant c £ [0,1]. □ 


We remark that one can use the following approach on the last machine |Tj. As in the first 
round, we run Greedy twice to obtain two solutions Ti = Greedy(S') and T 2 = Greedy(S' \ 
Ti). Additionally, we select a subset T 3 C Ti using an unconstrained submodular maximization 
algorithm on Ti, such as the Double Greedy algorithm of [1], which is a ^-approximation. The final 
solution T is the best solution among Ti,T 2 ,T 3 . If Greedy satisfies property |GP| then it follows 
from the analysis of |lj that the resulting solution T satisfies /(T) > 2 ( 1 X 0 -) ' /(OPT). This gives 
us the following corollary of Theorem 0 
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Corollary 10. Consider the problem of maximizing a submodular funetion subjeet to some hered¬ 
itary eonstraint X and suppose that Greedy satisfies (GP| for this problem. Let Alg be the 
algorithm deseribed above that uses Greedy twiee and Double Greedy. Then NMRandGreeDi 
aehieves (in expeetation) an ^^^-approximation for the same problem. 


Proof. By (GP) and the approximation guarantee of the Double Greedy algorithm, we have: 


fiT) > fin) > a • fin u OPT) 

(17) 

/(T) > fin) > a • fin u (OPT \ n)) 

(18) 

fiT) > fin) > ^/(TinOPT). 

(19) 


Additionally, from [H Lemma 2], we have: 


/(Ti U OPT) + /(Ta U (OPT \ Ti)) + /(Ti n OPT) > /(OPT) 


By combining the inequalities above, we obtain: 

OL Oi 

(1 + a)f{T) > - {fin u OPT) + fin U (OPT \ Ti)) + /(Ti n OPT)) > -/(OPT) 

and hence /(T) > 2 {^a) ' /(OPT) as claimed. Setting /3 = 2 {a+i) ™ Theorem we obtain an 
approximation ratio of □ 


5 Experiments 


We experimentally evaluate and compare the following distributed algorithms for maximizing a 
monotone submodular function subject to a cardinality constraint: the RandGreeDi algorithm 
described in Section]^ the deterministic GreeDi algorithm of [TO], and the Sample&Prune 
algorithm of |S|. We run these algorithms in several scenarios and we evaluate their performance 
relative to the centralized Greedy solution on the entire dataset. 


Exemplar based clustering. Our experimental setup is similar to that of m- Our goal is to 
find a representative set of objects from a dataset by solving a A:-medoid problem [7] that aims 
to minimize the sum of pairwise dissimilarities between the chosen objects and the entire dataset. 
Let V denote the set of objects in the dataset and let d : R x R —)■ M be a dissimilarity function; 
we assume that d is symmetric, that is, dii,j) = d(j, i) for each pair i,j. Let L : 2^ —)• M be the 
function such that L(A) = Yiv&v dia, v) for each set A C R. We can turn the problem 

of minimizing L into the problem of maximizing a monotone submodular function / by introducing 
an auxiliary element vq and by defining /(5) = L({uo}) — L(S' U {uq}) for each set S CV. 

Tiny Images experiments: In our experiments, we used a subset of the Tiny Images dataset consist¬ 
ing of 32 X 32 RGB images jl2] . each represented as 3,072 dimensional vector. We subtracted from 
each vector the mean value and normalized the result, to obtain a collection of 3,072-dimensional 
vectors of unit norm. We considered the distance function d(x, y) = ||x — y|p for every pair x,y of 
vectors. We used the zero vector as the auxiliary element vq in the definition of /. 


In our smaller experiments, we used 10,000 tiny images, and compared the utility of each algorithm 
to that of the centralized greedy. The results are summarized in Figures 1(c) and |l(f)| 


12 











Performance/Centralized Performance/Centralized Performance/Optimal Performance/Optimal 





(a) Kosarak dataset 


(b) accidents dataset 


(c) lOK tiny images 





(d) Kosarak dataset 


(e) accidents dataset 


(f) lOK tiny images 





synthetic diverse-yet-relevant in- (h) synthetic 
ice (n = 10000, A = n/k) GreeDi 


hard instance for 


(i) IM tiny images 



100 150 200 250 

k = m 



k = m 


(j) matroid coverage (n = 900, r = 5) (k) matroid coverage (n = 100, r = 100) 

Figure 1: Experimental Results 
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In our large scale experiments, we used one million tiny images, and m = 100 machines. In the 
first round of the distributed algorithm, each machine ran the Greedy algorithm to maximize a 
restricted objective function /, which is based on the average dissimilarity L taken over only those 
images assigned to that machine. Similarly, in the second round, the final machine maximized 
an objective function / based on the total dissimilarity of all those images it received . We also 
considered a variant similar to that described by m, in which 10,000 additional random images 
from the original dataset were added to the final machine. The results are summarized in Figure 

Em 

Remark on the function evaluation. In decomposable cases such as exemplar clustering, the 
function is a sum of distances over all points in the dataset. By concentration results such as 
Chernoff bounds, the sum can be approximated additively with high probability by sampling a 
few points and using the (scaled) empirical sum. The random subset each machine receives can 
readily serve as the samples for the above approximation. Thus the random partition is useful for 
for evaluating the function in a distributed fashion, in addition to its algorithmic benefits. 


Maximum Coverage experiments. We ran several experiments using instances of the Maximum 
Coverage problem. In the Maximum Coverage problem, we are given a collection C C 2^ of subsets 
of a ground set V and an integer k, and the goal is to select k of the subsets in C that cover as 
many elements as possible. 


Kosarak and accidents dataset^ We evaluated and compared the algorithms on the datasets used 
by Kumar et al. [8]. In both cases, we computed the optimal centralized solution using CPLEX, and 
calculated the actual performance ratio attained by the algorithms. The results are summarized in 


Figures 1(a), l(d)[ l(b)[ 1(€ 


Synthetic hard instances: We generated a synthetic dataset with hard instances for the deterministic 
GreeDi. We describe the instances in Section We ran the GreeDi algorithm with a worst-case 
partition of the data. The results are summarized in Figure E^)l 


Finding diverse yet relevant items. We evaluated our NMRandGreeDi algorithm on the 
following instance of non-monotone submodular maximization subject to a cardinality constraint. 
We used the objective function of Lin and Bilmes P: f{A) = J2i£V ~ where 

A is a redundancy parameter and {sij'\-- is a similarity matrix. We generated an n x n similarity 
matrix with random entries Sij G 100) and we set A = n/k. The results are summarized in 
Figure I l(g)[ 


Matroid constraints. In order to evaluate our algorithm on a matroid constraint, we considered 
the following variant of maximum coverage: we are given a space containing several demand points 
and n facilities (e.g. wireless access points or sensors). Each facility can operate in one of r modes, 
each with a distinct coverage profile. The goal is to find a subset of at most k facilities to activate, 
along with a single mode for each activated facility, so that the total number of demand points 
covered is maximized. In our experiment, we placed 250,000 demand points in a grid in the unit 
square, together with a grid of n facilities. We modeled coverage profiles as ellipses centered at each 
facility with major axes of length 0.1£, minor axes of length 0.1/£ rotated by p where £ G AA(3, |) 
and p G tl{^, 2 tt) are chosen randomly for each ellipse. We performed two series of experiments. In 
the first, there were n = 900 facilities, each with r = 5 coverage profiles, while in the second there 


^The data is available at http://fimi.ua.ac.be/data/ 
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were n = 100 facilities, each with r = 100 coverage profiles. 


The resulting problem instances were represented as ground set comprising a list of ellipses, each 
with a designated facility, together with a partition matroid constraint ensuring that at most one 
ellipse per facility was chosen. As in our large-scale exemplar-based clustering experiments, we 
considered 3 approaches for assigning ellipses to machines: assigning consecutive blocks of ellipses 
to each machine, assigning ellipses to machines in round-robin fashion, and assigning ellipses to 
machines uniformly at random. The results are summarized in Figures l(j) and |l(k)[ in these 
plots, GREEDi(rr) and GREEDi(block) denote the results of GreeDi when we assign the ellipses 
to machines deterministically in a round-robin fashion and in consecutive blocks, respectively. 


In general, our experiments show that random and round robin are the best allocation strategies. 
One explanation for this phenomenon is that both of these strategies ensure that each machine 
receives a few elements from several distinct partitions in the first round. This allows each machine 
to return a solution containing several elements. 


Acknowledgements. We thank Moran Feldman for suggesting a modification to our original 
analysis that led to the simpler and stronger analysis included in this version of the paper. 
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A Improved Deterministic GreeDI analysis 

Let OPT be an arbitrary collection of k elements from V, and let M be the set of machines that 
have some element of OPT placed on them. For each j G M let Oj be the set of elements of OPT 
placed on machine j, and let rj = \Oj\ (note that — ^)- Similarly, let Ej be the set of 

elements returned by the greedy algorithm on machine j. Let e*- G Ej denote the element chosen 
in the ith round of the greedy algorithm on machine j, and let E) denote the set of all elements 
chosen in rounds 1 through i. Finally, let E = Dj^MEj, and K* = OjE). 

We consider the marginal values: 

x] = fEy.{e)) = f{E))-f{E^^) 

y] = fEy^iOj), 

for each 1 < i < k. Note that because each element e) was selected by in the ith round of the 
greedy algorithm on machine j, we must have 

X* > max/pi -1 (o) > — (20) 

for all j G M and i G [k]. Moreover, the sequence Xj,... ,Xj is non-increasing for all j G M. Finally, 
define x^~^^ = yj~^^ = 0 and E^^^ = Ej for all j. We are now ready to prove our main claim. 

Theorem 11. Let OPT <G E be a set of k elements from E that maximizes f. Then, 

/(OPT) < 2\/fc/(OPT). 
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Proof. For every i G [k] we have 


/(OPT) < /(optuf;*) 

= /(f;*) + /e.(opt) 

j&M 

<f{E^)+J2fEdO,), 

jeM ■’ 


( 21 ) 


where the first inequality follows from monotonicity of /, and the last two from submodularity of 

/• 


Let f < A: be the smallest value such that: 


^ rj • < Vk • \f{E^+^) - f{E^) 

jeM 


( 22 ) 


Note that some such value must i must exist, since for i = k, both sides are equal to zero. We now 


derive a bound on each term on the right of (21). 

Lemma 12. J2jeM f{Ej) < Vk- f{OpT). 

Proof. Because i is the smallest value for which (|22|) holds, we must have 


Therefore, 


Y,rj-x^j>Vk- \f{E^) - f{E^~V , for all i<i. 

jeM 


H r, ■ /(Ej) =T.T.’-r [/(E') - /(E'-‘; 

j&M jeM e=l 


rj ■ Xj 


rj ■ Xj 


= EE 

jeM i=i 
i 

= EE 

£=i jeM 

i=i 

= Vk-f{E^), 


and so. 


/(^*) < 4 E fiE)) < 4f E O' • fiEj) < 4 E o • /(OPT) = Vk ■ /(OPT). 

yk V A: 

Lemma 13. J2j(^M Isfop ^ Vk- f{OpT). 


x/k ^ 
V K 


□ 


Proof. We consider two cases: 
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Case: i < k. 
j. Therefore: 


We have i +1 < k, and by (201 we have f^i (Oj) = < rj ■ for every machine 


^ fEdOj) <Y.^r 

jeM isM 

<Vk-U{E^^^)-f{E^)) 
= Vk • \ E^) 

< Vk ■ f{E^+^ \ E^) 
<Vk- fiOVT). 


Case: i = k. By submodularity of / and (20), we have 

fE^Oj) < 4^-1 (Oj) = Vj < rj • x’^. 
Moreover, since the sequence x],... ,x^ is nonincreasing for all j, 


ii;E4 = i.-nr, 


i=l 


Therefore, 


E fEdOj) < E ? • ^ E W • /(OPT) = /(OPT). 

jeM ^ jeM j&M 


Thus, in both cases, we have fE^iflj) — ' /(OPT) as required. 


□ 


Applying Lemmas 12 and 13 to the right of (|2ip, we obtain 


/(OPT) < 2\4- /(OPT), 


completing the proof of Theorem [m □ 

Corollary 14. The distributed greedy algorithm gives a approximation for maximizing a 

monotone submodular funetion subjeet to a cardinality constraint k, regardless of how the elements 
are distributed. 


B A tight example for Deterministic GreeDI 

Here we give a family of examples that show that the GreeDI algorithm of Mirzasoleiman et al. 
cannot achieve an approximation better than 1 /y/k. 

Consider the following instance of Max fc-Coverage. We have + l machines and k = £-l-£^. Let N 
be a ground set with £^ + £^ elements, N = {l,2,...,f^ + £^}. We define a coverage function on a 
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collection S of subsets of N as follows. In the following, we define how the sets of S are partitioned 
on the machines. 

On machine 1, we have the following i sets from OPT: Oi = {1,2,...,!'}, O 2 = 2£}, 

..., — £ + 1,... We also pad the machine with copies of the empty set. 

On machine i > 1, we have the following sets. There is a single set from OPT, namely O' = 
{£‘^ + {i — ^)£ + 1,1'^ + (i — 1)£ + 2,... ,£‘^ -\- i£}. Additionally, we have £ sets that are designed to 
fool the greedy algorithm; the j-th such set is Oj U {£‘^ + {i — 1)^ + i}- before, we pad the 
machine with copies of the empty set. 

The optimal solution is Oi,..., O^, 0[,..., O'p and it has a total coverage of £^ -\- £^. 

On the first machine. Greedy picks the £ sets Oi,..., Om from OPT and £‘^ copies of the empty 
set. On each machine i > 1, Greedy first picks the £ sets Aj = Oj U {£‘^ + {i — 1)£ + j}, since each 
of them has marginal value greater than O'. Once Greedy has picked all of the Aj’s, the marginal 
value of O^ becomes zero and we may assume that Greedy always picks the empty sets instead of 

Now consider the final round of the algorithm where we run Greedy on the union of the solutions 
from each of the machines. In this round, regardless of the algorithm, the sets picked can only cover 
{ 1 ,... ,£‘^} (using the set Oi,..., Of) and one additional item per set for a total of 2 £‘^ elements. 
Thus the total coverage of the final solution is at most 2£‘^. Hence the approximation is at most 

— -Al. ~ _L 
F+F i+i ^ vT' 
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